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The development of addictive behavior is marked by a loss of behavioral flexibility. In 
part, this is due to an increase in the ability of environmental stimuli to elicit responding 
and decreased importance of the action-outcome relationship in behavioral control. It has 
previously been demonstrated that both inactivation of and dopamine (DA) infusions in the 
infralimbic prefrontal cortex (PFC) can restore behavioral flexibility in paradigms measuring 
habitual reward seeking. Here, we investigated the mechanism by which cortical DA 
would act to enable goal-directed actions after the transition to habitual behavior has 
been established. Further, we extended this work to include a novel mouse model of 
compulsive-like behavior in which we assessed reward seeking despite the possibility of 
adverse consequences. Our data show that DA receptor D1 inhibition or D2 activation 
both promote the expression of a flexible responding after the development of habitual or 
compulsive-like behavior, and we suggest that the ability of DA infusions in the infralimbic 
PFC to restore sensitivity to changes in outcome value depends on activation of DA D2 
receptors. 
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INTRODUCTION 

The transition from casual drug use to addiction is character- 
ized by increasing loss of control over reward seeking. When a 
behavior is first learned, performance of the action is guided by its 
relationship to its outcome — i.e., a response is made in order to 
gain access to a reinforcer. Over time and after repeated execution, 
behavior transitions from goal-directed action to stimulus-driven 
habitual behavior (Dickinson, 1985). Habitual reward seeking 
is no longer mediated by action-outcome relationships or by a 
representation of the value of an outcome; rather, habitual behav- 
ior is automatically elicited by environmental cues and stimuli 
(c.f., Yin et al., 2008). In addition to habits, addictive behav- 
ior also involves the development of compulsive reward seeking 
that occurs despite adverse consequences (e.g., Everitt et al, 
2008; Heyne et al., 2009). Successful treatment of addiction may 
require restoration of the ability to update behavior in accor- 
dance with changed contingencies and in the face of negative 
outcomes. 

The shift in response strategy away from flexible, contingency- 
mediated behavior to one in which stimulus-response 
relationships guide behavior is paralleled by a change in the 
neuroanatomical substrates that mediate behavior from a 
prefrontal-striatal circuit in which the prefrontal cortex (PFC) 
monitors the action-outcome relationship, to a more dorsal 
circuit involving dorsolateral striatum (e.g., Yin and Knowlton, 
2006; Balleine and Dickinson, 1998). However, a role for the 
infralimbic PFC (IL) in the expression of habitual behavior 
has been demonstrated. When the IL, which projects to the 
nucleus accumbens shell (e.g., McGeorge and Faull, 1989) and 
amygdala (Sesack et al., 1989), is lesioned prior to response 



acquisition, animals are unable to express stimulus-response 
habits (Killcross and Coutureau, 2003). After extended training, 
IL lesioned animals remain sensitive to changes in outcome value. 
Importantly, later research expanded on this finding to show that 
inactivation of the IL after extended training, at a time point 
where intact animals are habitual, resulted in the restoration 
of flexible behavior (Coutureau and Killcross, 2003). More 
recent work has expanded upon these findings using optogenetic 
manipulations to investigate online regulation of the IL in the 
expression of habitual behavior (Smith et al., 2012). Together, 
these data suggest that the IL is critically involved in the selection 
of response strategy in situations of conflict between automatic, 
habitual behaviors and flexible goal-directed actions. 

Dopamine (DA) signaling within corticostriatal circuitry has 
been shown to play a unique role in both the formation and 
expression of goal-directed vs. habitual instrumental behavior 
(e.g., Nelson and Killcross, 2006). Our lab has shown that infu- 
sions of exogenous DA in the IL, but not the more dorsal 
prelimbic PFC (PL), restored sensitivity to outcome devaluation 
after extended training (Hitchcott et al, 2007). While a major- 
ity of these studies were performed in rats, we have found using 
lesion studies that the neuroanatomical mechanisms underly- 
ing habit learning are preserved in mice (Quinn et al., 2013). 
The mechanism by which both inactivation of and DA infusion 
into the IL can restore sensitivity to the action-outcome rela- 
tionship is unknown in rodents. Here, we assessed the ability of 
DA Dl and D2-family specific manipulations in the IL to restore 
flexible behavior as measured by either sensitivity to changes 
in action-outcome contingency or reduction of compulsive-like 
reward-seeking behavior in mice. 
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MATERIALS AND METHODS 
SUBJECTS 

Male C57b/6 mice were supplied from Charles River and deliv- 
ered to the Yale University/Connecticut Mental Health Center 
mouse vivarium between 56 and 70 days of age. These mice were 
allowed to acclimate for 2 weeks with ad libitum access to food 
and water. All behavioral procedures were approved by the Yale 
University IACUC and experiments were performed in accor- 
dance with the National Institute of Health Guide for Care and 
Use of Laboratory Animals. After acclimation, mice were food 
restricted to 90-92% of free feeding weight for all experiments. 
They had limited access to standard chow in their homecage each 
day, several hours after training. The amount of food provided 
was adjusted to maintain weights. Homecage chow was distinct 
from the purified grain pellets used in both the habitual and 
compulsive-like food-seeking experiments. There were approxi- 
mately 5-12 animals in each experimental group after exclusion 
of mice with inaccurate cannula placement or loss/clogging of 
cannula during the course of the experiments. Saline groups 
had large ns (>12) as a cohort of control (saline) animals was 
included in each testing session to ensure baseline effects were 
consistent. 

INSTRUMENTAL CONDITIONING CHAMBERS 

Instrumental chambers were identical to those described by 
(Barker et al, 2012). Briefly, 12 mouse instrumental chambers 
housed within a sound-attenuating box, were used for these 
experiments (Med-Associates; Georgia, VT). Each chamber was 
equipped with a 28 V house light located at the top of the mid- 
dle panel on the left side wall, three adjacent nosepoke apertures 
located at the bottom of the left side wall, and a magazine located 
at the bottom of the middle panel on the right side wall. Grain pel- 
lets were delivered to a magazine on the opposite wall. Nosepoke 
apertures and reinforcement magazine were equipped with a light 
and photobeam sensor. A fan provided background noise and 
ventilation. 

STEREOTAXIC SURGERY 

Mice were anesthetized using ketamine/xylazine. Bilateral can- 
nula (Plastics One; Roanoke, VA) were implanted and mounted 
to the skull using standard stereotaxic techniques. Cannula were 
targeted to the IL at AP + 1.7, ML ± 0.25, DV-3.0 from bregma 
based on coordinates from Wall et al. (2004). For compulsive-like 
food-seeking experiments, surgeries were performed prior to any 
training. For instrumental habit experiments, surgeries were per- 
formed after 3 days of fixed ratio (FR) 1 training to reduce the 
amount of time between cannula placement and testing. 

DRUGS AND INFUSIONS 

For tests of habitual and compulsive-like food-seeking, mice 
received two infusions of the same drug prior to a control and 
experimental session. Infusions were 0.2 uL over 2 min; internal 
cannula were left in place for an additional 2 min to allow for dif- 
fusion. This volume and diffusion duration were chosen based on 
the literature and our pilot data using thianin which suggested 
minimal spread to surrounding tissues at this volume and after 
the delayed removal of cannula. Drugs used were the Dl agonist 



dihydrexidine HC1 (DHX; Tocris; Minneapolis, MN), Dl antag- 
onist SCH23390 (Sigma; St. Louis, MO), D2 agonist quinpirole 
(Tocris) in saline, and the D2 antagonist sulpiride (Tocris) in 
acidified saline, each dissolved at 5 nmol per 1 ul. 

INSTRUMENTAL TRAINING 

During training, one nosepoke was assigned as the active nose- 
poke, where a response resulted in reinforcer delivery, and the 
others designated as inactive nosepokes. Training consisted of 
1 day magazine training, 3-days fixed ratio (FR 1) training (in 
which each active response resulted in reinforcer delivery) and 
3-days random interval (RI) 30-s training and 6 RI60 sessions. 
In RI sessions, reinforcement could be earned every 30 (RI30) 
or 60 (RI60) s on average. The actual duration of each interval 
was randomly determined so that reinforcement availability was 
not predictable. The first active response (nosepoke) after the 
interval ended resulted in reinforcer delivery; the duration of the 
next interval was then generated automatically. During each daily 
training session, the house light and fan were on. All sessions were 
30 min in duration. 

CONTINGENCY DEGRADATION TEST 

During degradation sessions, conditions were identical to train- 
ing except that the grain pellet reinforcer was delivered on a 
non-contingent schedule determined by each individual animal's 
reinforcement rate on the day prior. Reinforcer delivery was 
spaced equivalently across the 30-min session. Responses on the 
active and inactive nosepokes were recorded, but did not result in 
reinforcer delivery. Infusions of drugs occurred 5 min prior to the 
start of the degradation session. Mice were assigned to infusion 
groups by matching baseline response rates, and received a 0.2 ul 
infusion of either saline (n = 17), DHX (n = 11), SCH23390 
(n = 6), quinpirole (« = 12), or sulpiride (n = 9). More ani- 
mals were in the saline groups as a cohort of saline animals was 
included with each behavioral test session to confirm baseline 
effects were replicated. Data were compared to a non-degraded 
session in which the animals received the same drug; the order 
of these sessions was counterbalanced and animals received one 
normal RI 60 training session between both test sessions where 
no drug was administered. 

COMPULSIVE-LIKE FOOD-SEEKING TRAINING AND TEST 

Additionally, we assessed the effects of IL DA receptor modu- 
lation on compulsive-like behavior in mice using a modifica- 
tion of traditional conditioned place preference/aversion testing. 
Conditioning chambers were standard three chamber boxes with 
retractable doors (Med Associates; Georgia, VT). Chambers had 
distinct walls (vertical black and white stripes or diagonal marble 
and black stripes) and floors (wire mesh or grid). The two con- 
ditioning chambers were separated by a neutral, gray chamber. 
Photocell beam breaks were used to calculate time spent in each 
chamber, latency to enter the chamber and number of entries by 
Med-PC IV software. During a single habituation session, mice 
were placed in the neutral chamber with both doors retracted 
such that mice could freely explore all chambers. During condi- 
tioning, mice were confined to the "paired" chamber for 30 min 
with access to 30-grain pellets on days 1, 3, and 5. On days 2, 4, 
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and 6, mice were confined to the opposite chamber for 30 min 
with an empty food dish. 

On day 7, mice received an infusion of either a DA Df or D2- 
like receptor agonist or antagonist 5 min prior to being placed in 
the neutral chamber with both doors retracted and were allowed 
to freely explore all chambers for 5 min. This duration was cho- 
sen because we were able to examine entry into both chambers 
and latency to enter, but no extinction was expected to occur 
based on our preliminary data. Mice received a 0.2ul infusion of 
either saline (n = 20), DHX (« = 8), SCH23390 (n = 7), quinpi- 
role (n = 8), or sulpiride (n = 7). Latency to enter the chambers 
was the primary outcome measure. 

On the following day, mice were confined to the food-paired 
chamber. Two minutes after placement, mice received a 2 s, 
0.8mA foot shock. Mice remained in the chamber for 60s 
after the shock was terminated and were then returned to their 
homecage. On day 9, mice received a second infusion of the same 
drug as day 7. Five min after the infusion, they were returned 
to the gray chamber and allowed to freely enter both cham- 
bers and latency to enter the chambers was assessed in this 
20 min session. Latency was selected as the primary measure 
of compulsive-like behavior because it was not expected to be 
impacted by the extinction of either the association of the cham- 
ber with footshock or the association with the food reward which 
may be differentially impacted by prefrontal DA manipulations. 
Importantly, a change in the parameters of the training condi- 
tions might have an impact on the expression of reward-seeking 
under conflict between reward seeking and avoidance of negative 
consequences, either by increasing the aversive component (e.g., 
through increasing the shock intensity), the value of the reward, 
or the extent of learning (e.g., through extended training). 

CONFIRMATION OF PLACEMENT 

After behavioral assessment was complete, mice were sacrificed 
and tissue was fixed in paraformaldehyde for confirmation of 
cannula placement and location of the infusion tip using stan- 
dard histological techniques. If cannula were not clogged at the 
time of sacrifice, thianin was infused at the volume and rate used 
for testing (0.2 ul over 2 min). If cannula had become clogged, 
cannula tracts, and tips were confirmed. Mice were excluded 
if placement could not be confirmed to be in the IL through 
the use of neuroanatomical landmarks, including white matter 
tracts. 

STATISTICS 

Data were analyzed with JMP Software (SAS Institute) using 
repeated measures analysis of variance (AN OVA). Significant 
interactions were further analyzed using Tukey's HSD post-hoc 
tests. 

RESULTS 

CONTINGENCY DEGRADATION 

Data were square root transformed to maintain homogeneity 
of variance. To determine whether agonism and/or antagonism 
of DA Dl or D2 receptors influenced sensitivity to changes in 
action-outcome relation, active responding during a degraded 
session was compared to responding during a non-degraded 



session; during both test sessions the experimental drug was on 
board. Importantly, no differences in baseline response rates were 
seen in animals to-be assigned to groups [F^ so) = 1.122, p = 
0.356]. Additionally, rmANOVA [drug x non-degraded session 
("no drug" vs. "drug")] revealed no differences were observed in 
response rates between the "drug" and "no drug" non-degraded 
session (p > 0.5 for main effects, p = 0.185 for session x drug 
interaction). Repeated measures AN OVA revealed a significant 
session (degraded vs. non-degraded) x drug interaction on active 
responding [F(4. 46) = 2.92, p < 0.05]. Post-hoc analyses indi- 
cated that responding of the saline-injected animals did not differ 
significantly between the degraded and non-degraded session, 
indicating that under basal conditions animals were insensitive 
to the change in action-outcome relations, consistent with the 
formation of habit. Critically, responding during the degraded 
session differed significantly from the non-degraded session only 
for mice receiving the DA Dl receptor antagonist SCH23390 
(p < 0.05) or the DA D2 receptor agonist quinpirole (p < 0.05; 
Figure 1). Together these data demonstrate that only antago- 
nism of the Dl receptor or agonism of the D2 receptor in the 
IL are sufficient to restore sensitivity to changes in the action- 
outcome relationship, indicative of goal-directed instrumental 
behavior. Mice receiving the DA Dl receptor agonist DHX or 
DA D2 receptor antagonist sulpiride did not show differential 
responding between the degraded and non-degraded sessions, 
confirming that these opposing DA receptor manipulations do 
not impact sensitivity to changes in contingency after extended 
training. 

Because animals received infusions of the same drug during 
both test sessions and we used a within subjects analysis to assess 
responding, we are confident that the marked differences seen 
between the degraded and non-degraded sessions with either 
the SCH23390 or the quinpirole infusions reflected a change in 
response strategy. We do not believe this reduction in respond- 
ing in the degraded session, which is evidence for goal-directed 
instrumental action is related to non-specific alterations in task 
engagement, motivation, or locomotor effects as this would have 
been reflected as behavioral changes in both the degraded and 
non-degraded test conditions. 

COMPULSIVE-LIKE FOOD SEEKING 

To assess the effect of DA receptor manipulations on a novel 
measure of compulsive-like reward-seeking behavior, we com- 
pared the latency to enter the food reward-paired chamber after 
training, but prior to shock (pre-shock) with the latency after 
the animals had received a foot shock in the reward-paired 
chamber (post-shock). A repeated measures ANOVA revealed a 
significant session (pre-shock vs. post-shock) x drug interac- 
tion on latency to enter the reward paired chamber [F^ 46) = 
2.8205, p < 0.05]. Post-hoc analyses indicated that only animals 
that received SCH23390 or quinpirole infusions had post-shock 
latencies that were significantly increased compared to saline- 
infused animals (p < 0.05 and p < 0.01, respectively; Figure 2). 
Neither of these drugs impacted pre-shock latencies, indicating 
that DA receptor Dl antagonism or D2 agonism increased the 
latency to enter the reward-paired chamber only after that cham- 
ber had been paired with a negative consequence. Additionally, 
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FIGURE 1 | D1 antagonism or D2 agonism restore goal-directed 
behavior. (A) Experimental timeline. Mice only received infusions of drugs 
during counterbalanced test sessions. One half of mice received the 
degraded session first while other mice received a non-degraded session 
first. (B) Only mice whose cannulae were placed within the IL were 
included in analyses. Images modified from Paxinos and Franklin (2001). (C) 
Inhibition of D1 signaling with SCH23390 or agonism of the D2 receptor 
with quinpirole in the IL resulted in reduced responding only during the 
degraded session, consistent with restoration of goal-directed behavior. 
Error bars ±SEM. *p < 0.05. 
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FIGURE 2 | D1 antagonism or D2 agonism reduce compulsive-like 
reward seeking. (A) Experimental timeline. Mice received infusions of 
drugs into the I L only during test sessions to ensure that effects were on the 
expression, not acquisition, of compulsive-like behavior. (B) Only mice 
whose cannula could be confirmed to be within the IL were included in 
behavioral analyses. Images modified from Paxinos and Franklin (2001). (C) 
Antagonism of the D1 receptor with SCH23390 or agonism of the D2 
receptor with quinpirole in the IL reduced compulsive-like reward seeking as 
indicated by an increase in latency to enter the reward-paired chamber only 
after pairing with foot shock (adverse consequence). *p < 0.05, **p < 0.01 . 



administration of quinpirole or SCH23390 did not impact the 
time spent in the reward-paired chamber in either the pre- or 
post-shock test [Fq 21) = 0.2022, p = 0.8], though there was 
a main effect of session [F(i t 21) = 15.8571, p < 0.001]. These 
data suggest that inhibition of DA Dl or activation of DA D2 
receptors do not impact latency to enter the reward paired 
chamber in situations where there is no conflict, but decrease 
compulsive-like reward seeking after the risk of aversive outcome 
has been learned. Post-shock latencies to enter the reward-paired 
chamber after infusions of DHX or sulpiride, however, did not 
differ from saline treated mice (p > 0.7), indicating that DA 
Dl agonism or D2 antagonism did not impact compulsive-like 
reward seeking. 

During the pre-shock interval, only mice receiving DHX infu- 
sions showed an increase latency to enter the reward paired 
chamber as compared to saline treated mice (p < 0.05), suggest- 
ing that DA D 1 agonism impacts latency to enter a reward paired 
chamber under baseline conditions. Again, because mice receive 



infusions prior to both the pre-shock and post-shock test ses- 
sions, we do not think that the ability of SCH23390 or quinpirole 
to produce increased latencies to enter the post-shock cham- 
ber is reflective of altered activity levels or motivation to enter 
the chamber. To confirm that these manipulations did not gen- 
erally increase latencies to enter both the paired and unpaired 
chambers in the post-shock session, a rmANOVA was performed 
(shock x drug). The analysis revealed a main effect of drug on 
latency [F^ 42) = 3.26, p = 0.02] and a main effect of session 
[F( lj 42) = 5.54,p = 0.02], but not shock x drug interaction (p = 
0.21), suggesting that neither the SCH23390 nor the quinpirole 
interacted with shock exposure to produce a latency to enter both 
chambers. Further, these data suggest that exposure to these drugs 
during the pre-shock session did not result in a generalized aver- 
sion to both chambers in the post-shock test session. Follow up 
analyses indicated that SCH23390 administration resulted in an 
increased latency to enter the unpaired chamber in both the pre- 
and post-shock sessions. 
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DISCUSSION 

These experiments investigated the role of specific manipulations 
of IL DA Dl and D2 receptor signaling in flexible reward seek- 
ing. We found that after extended training in an instrumental 
task, at a time point when control animals were insensitive to 
changes in contingency, inactivation of DA Dl or activation of 
DA D2 receptors in the IL was sufficient to render mice sensitive 
to the change in the relationship between action and outcome. 
That is, either a decrease in DA Dl activity or an increase in 
DA D2 signaling resulted in restoration of goal-directed behavior 
after the transition to habit. Conversely, we saw that neither DA 
Dl agonism nor DA D2 antagonism had any impact on behav- 
ior after extended training, indicating that it is not a general 
change in the ratio of Dl to D2 signaling that produced this 
increased sensitivity to action-outcome relationship, but rather 
specific decreases in DA Dl activity or increases in DA D2 sig- 
naling allowed alterations in behavior. Importantly, these studies 
only investigate one form of loss of action-outcome relationship, 
and future research will be necessary to determine whether selec- 
tive infralimbic DA manipulations alter flexible responding in 
paradigms that disrupt contingency through provision of alter- 
native reinforcers, reversal of the action-outcome contingency 
through selective reinforcement of non-responding, or under 
conditions of extinction. 

In addition to restoration of goal-directed behavior after 
extended performance of an instrumental response, we simi- 
larly showed that Dl antagonism and D2 agonism in the IL 
reduced compulsive-like reward seeking in a task investigating 
competition between adverse consequences and reinforcement. 
Importantly, we again saw no effects of infralimbic Dl agonism 
or D2 antagonism on the ability to restore behavioral flexibil- 
ity. The increase in latency to enter the reward-paired chamber 
in mice receiving IL infusions of the DA Dl antagonist or DA 
D2 agonist occurred only after animals received a foot shock in 
the same chamber, indicating that these DA manipulations dur- 
ing the test did not impair either the ability to move toward the 
chamber or motivation to enter the reward paired chamber in 
the absence of conflict, i.e., prior to foot shock. Notably, IL DA 
D2 signaling has been shown to be critical for the extinction of 
conditioned fear (Mueller et al., 2010). However, we do not think 
this finding in anyway contradicts our conclusion that IL DA D2 
activity reduces compulsive reward seeking as infusion of the DA 
D2 agonist increases latency to enter the shock and reward-paired 
chamber, indicating that extinction has not occurred. Together, 
these data suggest that increased DA signaling through D2-like 
receptors in the IL restores flexible behavior, while DA Dl activity 
in the IL may be related to reduced sensitivity to action-outcome 
relationships, including a loss of such relationships through con- 
tingency degradation, and the risk of adverse consequences, as 
loss of signaling at this receptor restores flexible behavior. 

Our lab and several others, have long been interested in the role 
of corticostriatal dysfunction in inflexible, habitual, addiction- 
related processes (e.g., Jentsch and Taylor, 1999; Robbins and 
Everitt, 1999). We have previously demonstrated that administra- 
tion of exogenous DA into the IL restored goal-directed behavior 
in animals performing habitually (Hitchcott et al., 2007); our cur- 
rent data suggest that this effect was mediated by activity at DA D2 



receptors. Importantly, our current work focuses on the ability 
of DA manipulations to restore sensitivity in changes to action- 
outcome contingency, without investigating the role of change in 
outcome value. While in many cases, response strategy selection 
in these paradigms is consistent, it is possible that the ability to 
track action-outcome relationships is dependent on IL DA signal- 
ing in a way that is separate from the ability to regulate responding 
for a devalued outcome, and this has yet to be determined. DA 
has been shown to differentially affect PFC function depend- 
ing on the task used and the dose tested. For example, DA is 
thought to impact measures of prefrontal function, such as work- 
ing memory, in a dose-dependent manner through Dl -mediated 
alterations in the signal-to-noise ratio (e.g., Arnsten, 2007). Our 
data indicate that in assessments of habit, exogenous DA is pri- 
marily acting through DA D2 receptors to decrease infralimbic 
activity, which is consistent with the ability of both D2 agonists 
and DA to restore flexible reward seeking. In addition, this finding 
reconciles the data from studies indicating that both DA infusions 
(Hitchcott et al., 2007) and inactivation of the IL restore goal- 
directed behavior (Coutureau and Killcross, 2003). The activation 
of DAD 1 or D2 receptors has distinct and opposing downstream 
effects. DA Dl receptors are Ga s coupled, and their stimulation 
results in increased production of cyclic adenosine monophos- 
phate (cAMP) and the cAMP-dependent protein kinase (PKA). 
Activation of Ga,/ 0 coupled DA D2-like receptors, however, 
inhibits adenylyl cyclase activity, directly opposing DA Dl activity 
and downstream signaling. In addition to inhibition of pyramidal 
cells through the above described mechanism, DA D2 activation 
may further inhibit projection neurons through enhancement of 
GABAergic interneuron activity (Tseng and O'Donnell, 2007a). 
Enhanced signaling at infralimbic DA D2-like receptors relative 
to Dl receptors is likely to result in decreased neuronal activity. 
Based on the evidence that inactivation or lesion of the IL also 
impair the expression of stimulus-response habits (Coutureau 
and Killcross, 2003; Killcross and Coutureau, 2003), we propose 
that the ability of DA infusions in the IL to reinstate sensitivity 
to the action-outcome relationship is due to decreased activity 
and that the balance of D1/D2 activity in the IL is critical to the 
expression of flexible reward-seeking behavior. 

Though a precise role for infralimbic DA Dl and D2 signal- 
ing in habitual and compulsive-like reward seeking has not been 
previously investigated, IL has been implicated in situations of 
response conflict (Haddonand Killcross, 2011). Further, a role for 
prefrontal DA signaling has also been investigated in other mea- 
sures of flexible behavior. Blockade of DA D 1 or D2 in the medial 
PFC has been shown to impair the ability to update behavior to a 
change in reward value, while not impacting the ability to perceive 
the change (Winter et al., 2009). Additionally, DA D2 antagonism 
impaired flexibility in a set-shifting task, though agonism of DA 
D2 did not promote shifting (Floresco et al., 2006). Inhibition of 
the DA D4 receptor, a member of the D2-family of receptors, had 
opposing effects on set shifting. Consistent with these findings, 
it is possible that the effects of DA Dl inhibition and D2 activa- 
tion in our experiments result not from a change in infralimbic 
activity, but rather through changes in PFC network stability. It 
has been suggested that DA Dl activity can stabilize the existing 
PFC networks, potentially explaining why loss of DA Dl signaling 



www.frontiersin.org 



July 2013 | Volume 7 | Article 126 | 5 



Barker et al. 



Infralimbic DA in flexible behavior 



can promote flexible behavior through loss of this stabilization 
(e.g., Seamans and Yang, 2004; Durstewitz and Seamans, 2008). 
In this model, and consistent with our findings, DA D2 signaling 
would promote system lability through reduction in signaling in 
the GABAergic neurons, thus enabling the establishment of new 
behavioral patterns. The basis for this model, however, is work 
done in adolescent animals (e.g., Seamans et al., 2001) in which 
the DA D2 impact on GABAergic signaling may be different (i.e., 
opposite) from that seen in adult animals (Tseng and O'Donnell, 
2007b; O'Donnell, 2010); however, the discrepancy between these 
findings does not appear to be solely dependent on age (Kroener 
and Lavin, 2010). It therefore remains unclear whether in adult 
animals, D2 activation in the IL may act to reduce GABAergic 
inhibition of pyramidal cells, or perhaps, as described above, to 
produce a net decrease in IL activity. 

As our data suggest that a selective shift in the DA D1:D2 ratio 
in the IL can enable a shift in response strategy selection, it is 
important to consider that the observed separation between Dl 
and D2 effects may result from downstream influences on dis- 
tinct neuroanatomical targets. It has been well established that 
in the striatum, DA Dl- and D2-receptor containing medium 
spiny neurons are located in distinct populations of neurons 
that have separate projection targets. Indeed, striatal Dl- and 
D2-receptor containing neurons that participate in the direct 
and indirect pathways, respectively, have been shown to differ- 
entially contribute to the attribution of value to an action and, 
therefore, inform response selection in a distinct but comple- 
mentary fashion (Tai et al., 2012). While there is evidence that 
PFC neurons may co-express DA Dl- and D2-type receptors 
(Vincent et al, 1995), it has also been demonstrated that Dl 
and D2 containing neurons are at least in part distinct popu- 
lations (e.g., Caspar et al., 1995; Gee et al., 2012). It may be 
that DA Dl- and D2-expressing projection neurons in the IL 
also have separate targets and that modulation of DA Dl and 
D2 signaling differentially impacts downstream brain regions, 
therefore enabling a shift in contribution to response strategy 
selection between these targets. For example, it has been shown 
that disconnection of the IL from the nucleus accumbens shell can 
replicate the effects of IL inactivation on cocaine seeking (Peters 
et al, 2008). In addition to the nucleus accumbens shell, the IL 



also projects extensively to amygdalar nuclei (e.g., Vertes, 2004). 
Though the central nucleus of the amygdala has been shown to 
interact with the dorsolateral striatum to mediate the expres- 
sion of goal-directed and habitual behavior (Lingawi and Balleine, 
2012), the effect of IL disconnection from its targets on habitual 
and compulsive reward-seeking behavioral control is still under 
investigation. 

The precise role IL plays in response strategy selection and the 
mechanism by which decreased activity in the IL would restore 
goal-directed behavior, remain to be elucidated. Studies by Rich 
and Shapiro (2009) suggest that infralimbic activity lags behind 
response switching, while PL activity leads the change, suggest- 
ing perhaps that IL is involved in the maintenance of habits while 
activity in the PL is required to flexibly update responding. Loss 
of the IL may result in a reversion to the competing memory sys- 
tem that uses knowledge of the action-outcome relationship and 
outcome value to guide behavior. 

SUMMARY AND IMPLICATIONS 

The ability to behave flexibly is critical to the successful control of 
reward seeking, and a better understanding of the mechanisms by 
which response strategies shift away from those that are habitual 
or compulsive to those that are goal-directed, is likely to inform 
treatment of both drug and food addiction. Here, we show that 
increased D2 receptor or decreased Dl receptor activity in the 
IL can restore sensitivity to changes in action-outcome contin- 
gency and decrease reward seeking in the face of punishment. 
Importantly, these data help to explain the apparent discrepancy 
between the ability of infusions of DA and inactivation of the IL to 
enable a shift in response strategy, and will help to inform future 
work investigating the precise role that IL plays in facilitating 
plastic behavior. 
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